Learning from biased data using

نویسنده

  • A. J. Feelders
چکیده

Data bases sometimes contain a non-random sample from the population of interest. This complicates the use of extracted knowledge for predictive purposes. We consider a specific type of biased data that is of considerable practical interest, namely non-random partially classified data. This type of data typically results when some screening mechanism determines whether the correct class of a particular case is known. In credit scoring the problem of learning from such a biased sample is called “reject inference”, since the class label (e.g. good or bad loan) of rejected loan applications is unknown. We show that maximum likelihood estimation of so called mixture models is appropriate for this type of data, and discuss an experiment performed on simulated data using mixtures of normal components. The benefits of this approach are shown by making a comparison with the results of sample-based discriminant analysis. Some directions are given how to extend the analysis to allow for nonnormal components and missing attribute values in order to make it suitable for “real-life” biased data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Topic Bias on the Writing Proficiency of Extrovert/Introvert EFL Learners

This study was intended to find out any possible effect of topic bias on the writing proficiency of Iranian extrovert/introvert EFL learners at high/low writing proficiency levels. One hundred participants chosen from among 150 adult language learners on the basis of their personality type (extrovert/introvert) and writing proficiency (high/low) took part in this study. They were arranged into ...

متن کامل

Study of Random Biased d-ary Tries Model

Tries are the most popular data structure on strings. We can construct d-ary tries by using strings over an alphabet leading to d-ary tries. Throughout the paper we assume that strings stored in trie are generated by an appropriate memory less source. In this paper, with a special combinatorial approach we extend their analysis for average profiles to d-ary tries. We use this combinatorial appr...

متن کامل

On Mining Fuzzy Classification Rules for Imbalanced Data

Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...

متن کامل

Deep-Treat: Learning Optimal Personalized Treatments from Observational Data using Neural Networks

We propose a novel approach for constructing effective treatment policies when the observed data is biased and lacks counterfactual information. Learning in settings where the observed data does not contain all possible outcomes for all treatments is difficult since the observed data is typically biased due to existing clinical guidelines. This is an important problem in the medical domain as c...

متن کامل

Building a Biased Least Squares Support Vector Machine Classifier for Positive and Unlabeled Learning

Learning from positive and unlabeled examples (PU learning) is a special case of semi-supervised binary classification. The key feature of PU learning is that there is no labeled negative training data, which makes the traditional classification techniques inapplicable. Similar to the idea of Biased-SVM which is one of the most famous classifier, a biased least squares support vector machine cl...

متن کامل

On Mining Fuzzy Classification Rules for Imbalanced Data

Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999